[NFC] Use pascal-style string storage for IString/Name#8662
[NFC] Use pascal-style string storage for IString/Name#8662kripken wants to merge 22 commits intoWebAssembly:mainfrom
Conversation
| if (s.data() == nullptr) { | ||
| return nullptr; | ||
| } |
There was a problem hiding this comment.
Is is not valid for empty string views to have nullptr data pointers?
There was a problem hiding this comment.
It is valid, though we'd need to be careful with hashing etc. Anyhow, this empty string does not need any storage, so it seems simpler to just skip it (might also be faster but I didn't measure).
There was a problem hiding this comment.
But if someone tries to intern the string_view {nullptr, 0} and we create a View at nullptr, then trying to turn that View back into a string_view will segfault. That seems bad, right?
There was a problem hiding this comment.
Can't a std::string_view be null (internal pointer is nullptr)? Where would it segfault?
There was a problem hiding this comment.
Where it tries to dereference the pointer in the View (actually 4 bytes before it) to get the size.
There was a problem hiding this comment.
Oh right, View needs to special-case a null to avoid a read. I also see that passing nullptr, 0 is not defined behavior anyhow. Fixed.
| char operator[](int x) const { return str[x]; } | ||
|
|
||
| explicit operator bool() const { return str.data() != nullptr; } | ||
| explicit operator bool() const { return str.internal != nullptr; } |
There was a problem hiding this comment.
Interned empty string views can have str.internal == nullptr, but now that is interpreted as the absence of a name rather than the presence of an empty name. I don't think this is the intended behavior.
There was a problem hiding this comment.
An empty name is non-null internal pointing to some place, and that place has a size of 0? I think this should be ok. Pushed a unit test now, did I do something wrong there?
| auto null = IString(); | ||
| auto empty = IString(""); | ||
| EXPECT_NE(null, empty); |
There was a problem hiding this comment.
We also need this:
auto default = IString(std::string_view{});
This should be the same as empty, but I think it will currently be the same as null.
There was a problem hiding this comment.
Hmm, why should that be empty? I believe the default std::string_view is a null. The test I just added confirms that unless I'm confused.
There was a problem hiding this comment.
Well std::string_view{} == std::string_view("") holds, so the interned versions of those should be the same as well.
There was a problem hiding this comment.
Hmm, supporting that would add overhead. Do you think it's worth it?
There was a problem hiding this comment.
There shouldn't be any extra overhead except the first time we intern the empty stringref. We just need to remove the early return I commented on here and let the default empty string_view get interned like any other string_view.
There was a problem hiding this comment.
That almost works but we also need to handle the case of comparing that interned string to an IString(), which never had anything interned for it, and just contains nullptr. That requires either
- Making
IString()(the default value throughout the codebase) callinterned()to intern. - Make
operator==return "equal" if both sizes are 0, no matter what data they point to.
I'd rather not add either overhead. I don't think we must follow std::string_view's API. (Instead, we would be following other languages that do differentiate the null string from the empty string, so it's not like I'm suggesting some chaotic behavior 😄 )
There was a problem hiding this comment.
But the interned empty string_view won't have nullptr data, so it won't be confused with a default IString, which will have nullptr data, right? It's expected that IString() is not equal to IString(std::string_view{}) because the latter should be equal to IString("").
There was a problem hiding this comment.
Oh, maybe I misunderstood you before then. When you said
Well std::string_view{} == std::string_view("") holds, so the interned versions of those should be the same as well.
Did you mean "IF we interned the null string it should compare equal to the empty string (BUT we don't so things are ok as is?)"
If not, then I am confused. What concrete change to the API are you asking for?
There was a problem hiding this comment.
(Note that I added a test early in this thread - is that test wrong in your opinion? What should the test actually check at the API level?)
There was a problem hiding this comment.
I would expect these test expectations to pass both before and after this refactoring:
ASSERT_EQ(IString(""), IString(std::string_view{})); // !
ASSERT_NE(IString(), IString(""));
ASSERT_NE(IString(), IString(std::string_view{})); // !
ASSERT_FALSE(IString().is());
ASSERT_TRUE(IString("").is());
ASSERT_TRUE(IString(std::string_view{}).is()); // !
My understanding of the current change is that the assertions involving the default string_view will fail. Since this change is supposed to be NFC, this is a bug.
Previously our interned strings were std::string_view, which means a
pointer and a size. Instead, store the size as a header alongside the
string data. As we have many views on the same data, this reduces
memory usage, basically anywhere we use a Name, which is every
Call, GlobalSet, Load, etc.
I see a 5% RAM reduction, and a small improvement in instructions
and branches. Wall time improves by around 1% though that is
close to the noise.
This removes the "reuse" optimization where we could reuse a
string from the input. That was error-prone and a bad idea anyhow.
In practice, it might have helped a little, but this new model is simpler
and saves a lot more. (We can't reuse now since we need to convert
to pascal-style storage anyhow.)